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Energetic correlations due to polymeric constraints and the locality of interactions, in conjunction 
with the apriori specification of the existence of a particularly low energy state, provides a method of 
introducing the aspect of minimal frustration to the energy landscapes of random heteropolymers. 
The resulting funnelled landscape exhibits both a phase transition from a molten globule to a 
folded state, and the heteropolymeric glass transition in the globular state. We model the folding 
transition in the self-averaging regime, which together with a simple theory of collapse allows us to 
depict folding as a double- well free energy surface in terms of suitable reaction coordinates. Observed 
trends in barrier positions and heights with protein sequence length, stability, and temperature are 
explained within the context of the model. We also discuss the new physics which arises from the 
introduction of explicitly cooperative many-body interactions, as might arise from side-chain packing 
and non-additive hydrophobic forces. Denaturation curves similar to those seen in simulations are 
predicted from the model. 



1 



I. INTRODUCTION 



Molecular scientists view protein folding as a complex chemical reaction. Another fruitful analogy from statistical 
physics is that folding resembles a phase transition in a finite system. A new view of the folding process combines these 
two ideas along with the notion that a statistical characterization of the numerous possible protein configurations is 
sufficient for understanding folding kinetics in many regimes. 

The resulting energy landscape theory of folding acknowledges that the energy surface of a protein is rough, 
containing many local minima like the landscape of a spin glass. On the other hand, in order to fold rapidly to a 
stable structure there must also be guiding forces that stabilize the native structure substantially more than other 
local minima on the landscape. This is the principle of minimum frustration [Q. The energy landscape can be said 
then to resemble a "funnel" . |Q| Folding rates then depend on the statistics of the energy states as they become more 
similar to the native state at the bottom of the funnel. 

One powerful way of investigating protein energy landscapes has been the simulation of "minimalist" models. 
These models are not fully atomistic, but caricature the protein as a series of beads on a chain either embedded in 
a continuum Q or on a lattice j|. A correspondence, in the sense of phase transition theory, between these models 
and real proteins has been set up using energy landscape ideas ||. Many issues remain to be settled however in 
understanding how these model landscapes and folding mechanisms change as the system under study becomes larger 
and as one introduces greater complexity into the modelling of this correspondence, as for example, by explicitly 
incorporating many-body forces and extra degrees of freedom. Simulations become cumbersome for such surveys, and 
an analytical understanding is desirable. 

Analytical approaches to the energy landscape of proteins have used much of the mathematical techniques used to 
treat spin glasses || and regular magnetic systems Wj . The polymeric nature of the problem must also be taken into 
account. Mean field theories based on replica techniques Q and variational methods |]] have been very useful, but 
are more difficult to make physically intuitive than the straightforward approach of the random energy model [fl0[ , 
which flexibly takes into account many of the types of partial order expected in biopolymers jTlJ . Recently we have 
generalized the latter approach to take into account correlations in the landscape of finite-sized random heteropoly- 
mers [|l2j. This treatment used the formalism of the generalized random energy model (GREM) analyzed by Derrida 
and Gardner |13|. In this paper, we extend that analysis to take into account the minimum frustration principle and 
thereby treat protein-like, partially non-random heteropolymers. 

There are various ways of introducing the aspect of minimum frustration to analytical models with rugged land- 
scapes. One way recognizes that many empirical potentials actually are obtained by a statistical analysis of a database, 
and when the database is finite, there is automatically an aspect of minimal frustration for any member of that 
database. Thus the so-called "associative memory" hamiltonian models have co-existing funnel-like and rugged 
features in their landscape. Other methods of introducing minimal frustration model the process of evolution as giving 
a Boltzmann distribution over sequences for an energy gap between a fixed target structure and unrelated ones [fl5| . 
All of the above approaches can be straightforwardly handled with replica-based analyses. Here we show that the 
GREM analyses can be applied to minimally frustrated systems merely by requiring the energy of a given state to be 
specified as having a particularly low value. Minimally frustrated, funnelled landscapes are just a special case of the 
general correlated landscape studied earlier. 

A convenient aspect of the correlated landscape model is that it allows the treatment of the polymer physics in a 
very direct way, using simple statistical thermodynamics in the tradition of Flory Jl6| . Here we will show how the 
interplay of collapse and topological ordering can be studied. In order to do this we introduce a simple "core-halo" 
model to take into account the spatially inhomogenous density. We will also discuss the role of many-body forces in 
folding. Explicitly cooperative many-body forces have often been involved in the thinking about protein structure 
formation. Hydrophobic forces are often modelled as involving buried surface area. Such an energy term is not 
pairwise-additive but involves three or more interacting bodies. Side-chain packing involves objects fitting into holes 
created by more than one other part of the chain, thus the elimination of side-chains from the model can yield an 
energy function for backbone units with explicit non-additivity. These many-body forces can be treated quite easily 
by the GREM, and we will see that they can make qualitative changes in the funnel topography. 

To illustrate the methods here, we construct two-dimensional free energy surfaces for the folding funnel of minimally 
frustrated polymers. These explicitly show the coupling between between density and topological similarity in folding. 
We pay special attention to the location of the transition state ensemble and discuss how this varies with system size, 
cooperativity of interactions, and thermodynamic conditions. In the case of the 27-mer on a lattice, a detailed fit to 
the lattice simulation data Q is possible. Although delicate cancellations of energetic and entropic terms are involved 
in the overall free energy, plausible parameters fit the data. 

The trends we see in the present calculations are very much in harmony with the experimental information on the 
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nature and location of the transition state ensemble jl7|,[l8). We intend later to return to this comparison, especially 
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taking into account more structural details within the protein. 

The organization of this paper is as follows: In Sec. 2 we introduce a theory of the free energy at constant 
density and in this context investigate the effects of cooperative interactions on the transition state ensemble and 
corresponding free energy barrier. In Sec. 3 we detail a simple theory coupling collapse with topological similarity and 
resulting in the "core-halo" model described there. In Sec. 4 we apply this collapse theory to obtain the free energy 
in terms of density and topological order, now coupled via the core-halo model. In the same section we compare our 
model of the minimally frustrated heteropolymer with lattice simulations of the 27-mer. In terms of the categorization 
of Bryngelson et. al. [p| these free energy surfaces depict scenarios described as type I or type Ha folding. We then 
study the quantitative aspects of the barrier as a function of the magnitude of 3-body effects. The dependence of 
position and height of the barrier as a function of sequence length is studied, as well as the effects of increasing the 
stability gap. Finally, we study the denaturation curve as determined by the constant and variable density models. 
In Sec. 6 we discuss the results and conclude with some remarks. 



II. A THEORY OF THE FREE ENERGY 



In this section, we show how the apriori specification of the existence of a particularly low energy configuration, 
together with a theory of energy correlations for configurationally similar states, leads to a model for the folding 
transition and corresponding free energy surface in protein-like hcteroplolymcrs. This ansatz for the correlated energy 
landscape corresponds to the introduction of minimal frustration in a random energy landscape, where the order 
parameter here (which will function as a reaction coordinate for the folding transition) counts the number of native 
contacts or hydrogen bonds. 

The GREM theory for random heteropolymers developed by us earlier investigates the interplay between entropy 
loss and energetic roughness as a function of similarity to any given reference state, all at fixed density. However 
for exceptional reference states such as the ground state of a well-packed protein, the density is not independent of 
configurational similarity so a theory of the coupling of density with topological similarity must also be developed 
(section 3). 

We start by assuming a simple "ball and chain" model for a protein which is readily comparable with simulations, 
for example of the 27-mer, which is widely believed to capture many of the quantitative aspects of folding (section 
4). Proteins with significant secondary structure have an effectively reduced number of interacting units as may be 
described by a ball and chain model. Properties of both, when appropriately scaled by critical state variables such 
as the folding temperature T F , glass temperature T G , and collapse temperature T c , will obey a law of corresponding 
states ||. Thus the behavior describing a complicated real protein can be validly described by an order parameter 
applied to a minimal ball and chain model in the same universality class. For a 27-mer on a 3-dimensional cubic 
lattice, there are 28 contacts in the most collapsed (cubic) structure. For concreteness we take such a maximally 
compact structure to be the configuration of our ground state, the generalization to a less compact ground state being 
straightforward in the context of the model to be described. For a collapsed polymer of sequence length N, the number 
of pair contacts per monomer , z N , is a combination of a bulk term, a surface term, and a lattice correction |l9[ |. The 
effect of the surface on the number of contacts is quite important even for large macromolecules, as Zs(N) approaches 
its bulk value of 2 contacts per monomer rather slowly, as ~ 2 — SN^ 1 ^ 3 . 

To describe states that are not completely collapsed, we introduce the packing fraction rj = Na/R? g as a measure of 
the density of the polymer, where a is the volume per monomer and R g is the radius of gyration of the whole protein. 
So for less dense states the total number of contacts is reduced from its collapsed value Nz N , to Nz N rj. 

In the spirit of the lattice model we have in mind for concreteness, we introduce a simple contact hamiltonian to 
determine the energy of the system: 

H X! '■•' T '.' • ( IU ) 

i<j 

where — 1 when there is a contact made between monomers {ij} in the chain, and cry = otherwise. Here 
contact means that two monomers {ij}, non-consecutive in sequence along the backbone chain, are adjacent in space 
at neighboring lattice sites. £y is a random variable so that, at constant density, the total energies of the various 
configurations, each the sum of many e^, are approximately gaussianly distributed by the central limit theorem, with 
mean energy (at a given density rj) E v = Nz^rje, (where e is simply defined as the mean energy per contact and Nz^r) 
is again the total number of contacts), and variance AE 2 = Nz N i]s 2 , where e 2 is the effective width of the energy 
distribution per contact. 

Suppose there exists a configurational state n of ener gy E n (which will later become the "native" state) . Then if 



the Hamiltonian for our system is defined as in eq. ( II. 1 ) , we can find the probability that configuration a has energy 
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E a , given that a has an overlap Q an with n, |L2| where Q an = Q is the number of contacts that state a has in common 
with n, divided by the total number of contacts Nz N rj (since this analysis is at constant density both a and n have 
Nz N r] contacts). This distribution is simply a gaussian with a Q dependent mean and variance: 

Pan (E a \Q\E n ) _ ^ / [(E a -E)-Q(E n -E)] 2 \ ^ 



Pn {E n ) * \ 2Nz ti 7 1 e 2 (1 - Q 2 



When Q — 1 states a and n are identical and must then have the same energ y, which ( II. 2 ) imposes by becoming 



delta function, and when Q — states a and n are uncorrelated and the n (11.2) becomes the gaussian distribution of 



the Random Energy Model for the energy of state a |jl|]. Expression (1L2) holds for all states of the same density as n, 
e.g. all collapsed states if n is the native state (the degree of collapse must be a somewhat coarse-grained description 
to avoid fluctuations due to lattice effects coupled with finite size). 

Previously a theory was developed of the conhgurational entropy S v ( Q) a s a function of similarity Q, at constant 
density r\. | fl2[ Given S„ (Q) and the conditional probability distribution (II.2), the average number of states of energy 
E and overlap Q with state n, all at density 77, is then 



(n„ (E\Q\E n )) =e s n(Q) X P{mEn) 




1 / (E -E) - Q (E n - E) 



(II.3) 



2(1-Q 2 ) ^ NJ n 

where J 2 = z N r]s 2 and s v (Q) = S V (Q)/N |20|]. Equation (|l|) is still gaussian with a large number of states 
provided E < E c (for negative energies) where E c = QE n + NJ V y/2 (1 — Q 2 ) s„ (Q) is a critical energy below which 
the exponent changes sign and the number of states becomes negligably small. 

At temperature T, the Boltzmann factor ^e~ E / T weighting each state shifts the number distribution of energies 
so that the maximum of the thermally weighted distribution can be interpreted as the most probable (thermodynamic) 
energy at that temperature 

x — / — \ Nz NV e 2 (1 - Q 2 ) , 
E n {T,Q,E n ) = E + Q(E n -E) ?-L . (II.4) 



The above expression for the most probable energy is useful provided the distribution ( |II.3| ) is a good measure of 
the actual number of states at E and Q, the condition for which is that the fluctuations in the number of states be 
much smaller than the number of states itself. To this end, we make here the simplifying assumption that in each 
"stratum" defined by the set of states which have an overlap Q with the native state, the states themselves are not 
further correlated with each other, i.e. P (E a ,Q, Eb,Q\E n ) = P (E a ,Q,E n ) P (Eb,Q, E n ), so that in each stratum 
of the reaction coordinate Q, the set of states is modelled by a random energy model. Then since the number of 
states n r) (E\Q\E n ) counts a collection of random uncorrelated variables, large when E > E c , the relative fluctuations 

1 — (n)) 2 \ I (n) are ~ (n) and are thus negligible. So n (E\Q\E n ) sa (n(E\Q\E n )), and we can evaluate the 



exponent in the number of states ( [II. q ) at the the most probable energy (II.4) as an accurate measure of the (Q 
dependent) thermodynamic entropy at temperature T 

S n (T, Q, E n ) = S r , (Q) ^ • ( IL5 ) 

The assumption of a REM at each stratum of Q is clearly a first approximation to a more accurate correlational 
scheme. The generalization to treat each stratum itself as a GREM as in our earlier work is nevertheless straightfor- 
ward, since our earlier work suggested only quantitative changes, which we will not pursue here. If two configurations 
a and b have overlap Q with state n and thus are correlated to n energetically, they are certainly correlated to each 
other, particularly for large overlaps where the number of shared contacts is large. Using the REM scheme at each 
stratum is more accurate for small Q and breaks down to some extent for large Q. In the ultrametric scheme of the 
GREM, states a and b have an overlap q a b > Q, which is more accurate for large overlap than at small Q where 
states a and b need not share any bonds and still can both have overlap Q with n. One can also further correlate the 
energy landscape of states by stratifying with respect to q a i, = q and so on, resulting in a hierarchy of overlaps and 
correlations best treated using renormalization group ideas. 
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Jus t as the number of states (II. 3) has a characteristic energy for which it vanishes, the REM entropy for a stratum 



at Q flll.q ) vanishes at a characteristic temperature 



T g (Q) z^{l-Q*) 

~^~V 2s v (Q) (IL6) 
which signals the trapping of the polymer into a low energy conformational state within the stratum characterized by 

Q- 

If T g (Q) is a monotonically decreasing function of Q, as the temperature is lowered the polymer will gradually be 
thermodynamically confined in its conformational search to smaller and smaller basins of states. The basin around the 
native state is the largest basin with the lowest ground state, and hence is the first basin within which to be confined. 
Its characteristic size at temperature T is just the n umber of states within overlap Q (T), where Q a (T) is the value of 



overlap Q that gives T g (Q ) = T in equation (II. 6). Thus there is now no longer a single glass temperature at which 
crgodic confinement suddenly occurs, as in the REM, but there is a continuum of basin sizes to be localized within 
at corresponding glass temperatures for those basins. 

If T g (Q) has a single maximum at say Q*, the glass transition is characterized by a sudden REM- like freezing to 
a basin of configurations whose size is determined by Q*. The range of glass temperatures will turn out to be lower 
than the temperature at which a folding transition occurs (see Fig. |^), so that this model predicts a protein-like 
heteropolymer whose folded state is stable by several k B T at temperatures where freezing becomes important. A 
replica-symmetric analysis of the free energy is therefore sufficient to describe the folding transition to such deep 
native states that are minimally frustrated. 



From the thermodynamic expressions for the energy (II.4) and entropy flll.q) with the mean energy at density 77, 



we can write down the free energy per monomer above the glass temperature as the sum of 4 terms 

^ (T, Q, E n ) = z^rje + Qz N 6e n - Ts v (Q) - ^gf (l - Q 2 ) , (II.7) 

where z N 5e n — z N (e n — e) = [E n — E) /N is the extra energy for each bond beyond the mean homopolymeric attraction 
energy (the energy "gap" between an average molten globule structure and the minimally frustrated one), times the 
number of bonds pe r mo nomer, and (Q) = (Q) /N is the entropy per monomer. 



The first term in (II. 7) multiplied by N is just the homopolymeric attraction energy between all the monomers, for 
a polymer of density r\. It depends only on the degree of collapse, and not on how many contacts are native contacts. 
The second term is the average extra bias energy if a contact is native, times the average number of native contacts 
per monomer. The third term measures the equilibrium bias towards larger configurational entropy at smaller values 
of the reaction coordintate Q. The last term accounts for the diversity of energy states that exist on a rough energy 
landscape, the variance of which lowers thermodynamically the energy more than the entropy, and so lowers the 
equilibrium free energy. 



For a special surface in (fe n , s, T) space, expression (II. 7) has a double minimum structure in the reaction coordinate 



Q, with one entropic minimum at low Q corresponding to the "molten globule" state, separated by a barrier from 
an energetic minimum at high Q corresponding to a "folded" state. For a given temperature, values of Ss n and 
e 2 can be obtained which are reasonably close to the values obtained by a more accurate analysis which includes 
the coupling of density with topology, but we will not examine the constant density case in much detail for reasons 
discussed below, except to remark that 1.) The true coupling between density and Q-constraints need not be strong 
to obtain a double-well free energy structure, 2.) For monomeric units with pair interactions, the molten globule and 
folded minima are not at Q = and 1 respectively. The position of the molten globule state is near the maximum 
of the entropy of the system, which is at Q = 0.1 for the 27-mer due to the interplay of confinement effects and the 
combinatorial mixing entropy inherent in the "coarse-grained" description Q. [fl2"| The native minimum shifts to Q = 1 
when many-body interactions are introduced (see the next section ). 3.) The barrier height, at position Q° = 0.25 
for the 27-mer with protein-like parameters (T F /T G = 2), is small (AF° as k B T F ), due to the effective cancellation 
of entropy loss by negative energy gain, as the system moves toward the native state (This cancellation is re duce d 



when many-body forces are taken into account). 4.) When a linear form for the entropy is used in equation ( II. 7 ). 
e.g. s(Q) — s (l — Q) instead of the more accurate s(Q) obtained in reference Jl2] |, the double minimum structure 
disappears and is replaced by a single minimum near T F at Q sa 1/2, with the Q = and Q = 1 states becoming free 
energy maxima. So folding is downhill or spinodal-like in this approximation. 
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A. Effects of cooperative interactions 



As the interactions between monomcric segments become more explicitly cooperative, the energetic correlations 
between states become significant only at greater similarity, with the system approaching the REM limit for oo-body 
interactions, where the statistical energy landscape assumes a rough "golf-course" topography with a steep funnel 
close to the native state. 

In the presence of many-body interactions, the homopolymer collapse energy also scales as a higher power of density 
(~ e z m_1 ). For even moderate m ~ O (1) a first-order phase transition to collapsed states results, which effectively 
confines all reaction paths in the coordinate Q between molten globule and folded states to those where the density 
is constant and w 1. So within this constant density approximation we can investigate the nature of the folding 
transition as a function of the cooperativity of the interactions, and see how the correlated landscape simplifies to the 
REM in the limit of m-body interactions with large m. 

In the presence of m-body interactions, the Q dependence in the pair energy distribution ( |II.2| ) scales with Q as 
Qrn-i Tj S mg this modified pair distribution along with the collapsed homopolymeric state as our zero point energy, 
the free energy ( |IL7| ) becomes 

^ (T, Q, E n ) = -Tax (Q) - QT' 1 ^ |fen| - ^ (l - Q 2(ro_1) ) , (H.8) 

where si (Q) is the entropy as a function of constraint Q for a fully collapsed polymer. For pure 3-body interactions 
and higher, the globule and folded states are very nearly at Q = and Q = 1 respectively (see figure |l|A). To the 
extent that this approximation is good, we can equate the free energies of the molten globule and folded structures and 
obtain an m-independent folding temperature (note again that this is not a good approximation for pair interactions): 




Tf = z ^ 1+ !_ - (IL9) 



where s is the maximum of the entropy as a function of constraint Q (essentially the log of the total number of 
configurations). 

From expression (|L9|) we can obtain a first approximation to the constraint on the magnitude of the gap energy 
5e n in order to have a global folding transition (rather than merely a local glass transition) to the low energy state in 
question. The condition that the square root term in eq. ( |II.9D be real gives the minimum gap for global foldability 
in terms of the roughness e: 

S3!- = .IK * yft (11.10) 

e \ z N 

where the minimum folding temperature is then k B T£ c ' sa (or equivalently, one can obtain the maximum 

roughness for foldability as w l/v2 of a given gap energy). For typical pro teins (with folding temperatures at 
330 K) gap energies are (at least) w lkCal/mol- (lattice unit). Note that eq. ( ILlOj ) is precisely the same result, as 



it should be, to that obtained previously |22j in the context of finding optimal folding energy functions, by requiring 
the quantity T F /T G > 1, where the glass temperature T G — y/ z N e 2 /(2s ) is evaluated at the molten globule overlap 
Q = Q g . From these arguments one can see that the distinction between the folding transition and the glass transition 
is a quantitative one characterized by the distinction between global and local basin sizes, but a crucial one for the 
sub-class of biological heteropolymers. 



Evaluating F (T,Q, E n ) /N (eq. ([1.8)) with protein-like energetic parameters at the folding temperature T F , we 
obtain free energy curves as in Fig. |1]A, plotted for example with m = 3 and m — 12, for a 27-mer lattice protein. 

Note that the transition state ensemble (the collection of states at Q = Q*, where Q* is where the free energy is a 
maximum) becomes more and more native like (and thus the ensemble becomes smaller and smaller, eventually going 
to 1 state in the REM) as the energy correlations become more short-ranged in Q (i.e. as m increases) - see figure [I]b. 
The corresponding free energy barrier then grows with m as the energetic bias (~ Q m ~ l ) overcomes the entropic 
barrier only much closer to the native state, and the barrier becomes more and more entropic and less energetic (see 
fig. 00). * 

As was already mentioned, the above analysis was for a polymer of constant collapse density. However, numer- 
ical evidence for lattice models of protein-like heteropolymers suggests a coupling of density with nativeness, with 
energetically favorable native-like states typically being denser. So to this end we now investigate in detail a simple 
interpolative theory coupling collapse density 77 with nativeness Q, assuming a native (Q = 1) state which is com- 



pletely collapsed (77 = 1). Including this effect in eq. ([1.7) will complete our simple model of the folding funnel 
topography in two reaction-coordinate dimensions. 
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III. A THEORY OF COLLAPSE 



At low degrees of nativeness, we expect that collapse should be roughly homogeneous throughout the polymer, so 
that the density 77 should depend only on the total number of contacts Z = N~z, where z is the average number of 
contacts per monomer. This is the case if we straightforwardly apply Nz K r] = N~z, so that r\ — ~zj 'z N , and any theory 
of 77 must have this form in the low Q limit (as well as 77 — > 1 as Q — > 1). Now as we progress towards more native 
structures (higher Q), we should introduce a model that distinguishes between the constrained native structures or 
regions (in space) fixed or "frozen" by virtue of their native contacts, and those non-native regions, typically less 
dense, and not constrained by any native contacts. This model will impose a Q dependence on 77 by ascribing different 
densities to the collapsed, native "core" region(s) and the non- native, less dense "halo" region(s). We adopt the 
simplest model that there is one native core region of density 77 = 1, surrounded by a halo region with 7y H < 1 (see 
fig. ||). To find the Q and 1 dependence of ?7h, we see that the total number of contacts Nz is the sum of two terms 

Nz = N c z Nc + N H v H z Nh . (IH.l) 

The first term is the number of contacts inside the native core region, where N c is the number of frozen monomers and 
z Nc is the number of contacts per monomer in the core. The number of contacts per monomer in a three dimensional 
collapsed walk of length N, mentioned above in Sec. II, is given approximately by 



— Int[2iV- 3(A + 1) 2/3 + 3] (III.2) 



where Int[. . .] means take the integer part. So the number of contacts per monomer in the core is eq. (III.2) with N 
replaced by the number of core monomers, N c . The second term is the total number of contacts in the halo, where 
iV H is the number of monomers in the halo N — N c , and t/^z^ is the approximate number of contacts per monomer. 
The packing fraction rj H can be interpreted as the probability that a monomer is within a region of space At = b 
where b 3 is the size of a mo nomer, which reduces the number of contacts per monomer from its collapsed value, z Nh , 



to 77 H z N(l . z Nh is eq. (III. 2) with N — > A H , which accounts for the fact that the halo has both an inner and outer 
surface. Contacts at the interface of the core and halo are neglected. 

Next we assume that basically all the native contacts are made in the dense core, so that Q, the number of native 
contacts over the total number of possible native contacts, is given by N c z Nc /Nz N . Then using N H = N — N c = 
N — 7Vz N Q/z NQ , we can express ry H as a function of Q and z 

V, (Q,z) = V" nQ n r • (HI.3) 

The condition r] H > corresponds to the condition that the number of native bonds Nz N Q cannot exceed the total 
number of bonds Nz. Note that r/ H — > z/z N = rjtot at low nativeness (small Q), where the polymer is almost all halo. 
As the polymer becomes native-like (Q — > 1), contacts at the surface between the core and the halo become more 
important, and the simple theory begins to break down. 

In the next section, we obtain the free energy in terms of the reaction coordinates Q and z through the introduction 
of the halo density obtained above, but we can now re-investigate the glas s transition temperature as a function of 
both Q and z through the insertion of the halo density % (Q,z) into ( fLg ), giving the regions in the space of these 
reaction coordinates where the dynamics would tend to become glassy if T g (Q,z) were comparable to T F (see fig. ||). 
The values of T F /T G (Q,z) are always greater than = 1.6, as one can see from the figure, and so the assumption of 
self- averaging used in section 2 is valid here, with the exception of very native-like states (at high Q and z), where 
the free energy becomes strongly sequence dependent for a finite size polymer. 

Of course T F /T G (Q,z) is a rather crude measure of self-averaging, and a more rigorous method would be to follow 
the calculations by Derrida and Toulouse [^3| of the moments of the probability distribution of Y = J2j Wj ? measuring 
the sample to sample fluctuations of the sum of weights of the free energy valleys, and generalize them to finding the 
probability distribution of Y(Q,~z). 

The shape of the free energy surface is very sensitive to the form of r] H (Q,z), and this model of collapse represents 
one of the cruder approximations of the theory. It predicts a weakly decreasing halo density as Q increases, and 
predicts at T F a folded state with a significant halo, that has overall less contacts than in the MG state (although 
there is of course a much larger native core) . This over-expansion of the halo compensates entropically for the large 
loss in entropy due to the Nz K Q constraints. 
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IV. THE DENSITY-COUPLED FREE ENERGY 



The halo density % (Q,~z) will appear in the roughness term of (II. 7) since this term arises as a result of non-native 
interactions which contribute to the total variance of state energies. % would also appear in the entropic contribution 
in (II. 7) because the parts of the polymer contributing to the entropy are the dangling loops or pieces constituting 



the halo - the frozen native core is fully constrained, but slightly more accurate values are obtained specifically for 
the barrier position Q* if in the entropic term we use an interpolation between the the low Q formula of the density 
(to which it simplifies anyway at weak topological constraint), and the core-halo formula (III. 3) at high Q: 



C (Q,z) = (l-Q)_ + , 



z N Q 



_2N_| 

2NQ 



This gives more weight to the two behaviors in their respective regimes: mean-field uniform density at weak constraint 
or low Q, and core-halo behavior at strong constraint. The pure halo density formula ( III.3| ) is still used in the 
roughness term (but this is not a crucial point). The total density r\tot — z/z N appears in the homopolymeric term 
since this energetic contribution is a function only of the number of contacts, irrespective of whet her they were native 
or not. The extra gap energy conveniently defined with respect to fully collapsed states in ( [II. 7j) is an energetic 
contribution adde d to each native bond formed, independent of z, up to the limit Qz u = z set by ( pi.3 ) , where the 
gap term in (II. 7) becomes sim ply z5e n . 

These substitutions in ( |ll.7| ) describe a free energy surface as a function of the reaction coordinates Q and z - 
essentially the native contacts and the total contacts: 



F 
N 



(T,Q,z\E n 



-z s 



Qz N \Se u \ -Ts(Q, z) 



2T 



;i - q 2 ) 



(IV.l) 



where s (Q,~z) = s (Q, jf° l (Q 7 z)). The first term is an equilibrium bias towards states that simply have more contacts 
and depends only on z, whereas the second term is a bias towards states with greater nativeness and is depends only 
on Q, although the maximum value of this bias does depend on z as explained above. The entropic term biases the 
free energy minimum towards both small vlaues of Q and z where the entropy is largest. The energetic parameter T 
determining the weight of this term is held fixed at a value T F described below, and the other energetic parameters 
(e, 5e n , and s) are adjusted so as to give the free energy a double well structure with folded and unfolded minima 
of equal depth. The free energy bias due to landscape roughness is largest when there are many non-native contacts 
(z is large and Q is small) which means that the protein can find itself in non-native low energy states due to the 
randomness of those non-native interactions. 



A. Comparison with a Simulation 



The 27-mer lattice model protein has been simulated for polymer sequences designed to show minimal frustra- 



tion. |I|,|24| The system we are interested in is modelled by a contact hamiltonian as in ( II. 1 ) , but now the beads 
representing the monomers are of 3 different kinds with respect to their energies of interaction. If like monomers are 
in contact, they have an energy Sij = —3, otherwise = — 1, where the interaction energies are in an arbitrary scale 
of units of order k B T. This specific sequence is modelled to have a fully collapsed "native" state with a specific set of 
28 contacts and a ground state energy of —3 x 28. 

In the thermodynamic limit, the discrete interaction energies used in the simulation give a gaussian distribution for 
the total energy of the system by the central limit theorem, whose mean and width naturally depends on the fraction 
of native contacts. 

If we call Z the total number of contacts of any kind, the energy at Q and Z is determined simply by the energies 
of these native and non-native contacts above, while the entropy at high temperatures is the log of the number of 
states satisfying the constraints of Z total contacts and /x native contacts. However, the temperature range where 
folding occurs is well below the temperature of homopolymeric collapse, and so the polymer can be considered to be 
largely collapsed. This can be seen either by direct computation or by computing the entropy |24|, defined through 

S(Q,z, T) = -^pilogft 



E 



e -E,/T\ / -Ei/T- 

— M— 1 
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where Z v is the (partial) partition function, the sum being over all of the states consistent with the constraints 
characterized by \i and Z above. 

We can now easily obtain the free energy as F = E — TS, shown in Fig. |I]A for the 27-mer as a surface plot vs. 
the total number of contacts per monomer z = Z/N, and Q — (total number of native bonds)/28, the number of 
native contacts over the total number of possible contacts The largest value of Q for a given Z is Z/28, because 
there cannot be more native contacts than there are total contacts, hence the allowable region is the upper left of the 
surface plot. The surface plot in fig. has a double minimum structure at a specific temperature T F = 1.51 on the 
energy scale where £jj = {—3,-1} described above. The free energy barrier of ss 2k B T F is small compared with the 
entropic barrier of the system ( ~ 14fc B T F ). The transition ensemble at reaction coordinates (Q*,z*) = (0.54,0.88), 
consists of about exp7Vs(Q*, z*) = 2,000 thermally occupied states and ~ 10 5 configurational states. 

There are 4 energetic parameters in the free energy theory (e,e, Se n , and k B T F ), and 3 parameters in the simulation 
(e(like units), e(unlike units), and k B T F ), plus the roughness parameter, which is implicitly evaluated through the 
diversity of energies consistent with overlap Q. It is worth noting that the minimal frustration in the lattice simulation 
is implicit in the sequence design, in that the ground state is topologically consistent with all the pair interactions 
between like monomers. However, the gap energy in the simulation is functionally different than the theoretical model 
in that contacts between like monomers are always favored whether native or not, and in the theory only true native 
contacts have explicit contributions to the energy gap. This means that denser states are weighted more strongly in 
the simulation than in the theory, and thus we may expect our homopolymer attraction parameter e to be somewhat 
larger than the simulational average (e w 2) for the same T F . 

We do not undertake here a comparison of simulations at all parameter values with theory. Rather, we compare 
simulations and theory only for the 27-mer, with parameters chosen to be protein-like acording to the corresponding 
states principle analysis of Onuchic et. al. S. The scheme for comparison between the simulations and theory for 
the 27-mer is to hold T F fixed at the simulational value of 1.5, and then determine the remaining three energetic 
parameters (e , 5e u ,e) by appropriate constraints. 

One systematic method of finding protein-like energy parameters is to assume the folded state is the native state, 
and solve three linear equations in the energy parameters determined by the condition of folding equilibrium 

F(Qg,Zg) — F (Q fold, z 'fold) 

= F{l,z N ) , 

and the conditions that the molten globule at (Q gi ~z g ) is a free energy minimum (or saddle point) 

8F 

= . 



dF 
dQ 



= and 



{Qg,Zg) 



However, for similar reasons as in the density un-coupled formulation of the free energy, the folded minimum is not 
equivalent to the native state, and this assumption leads to pathologies when the previous treatment is implemented. 
The introduction of a non-uniform density in the protein leads to an ensemble of folded state s cons isting of a dense 



core (with r\ = 1), and an expanded halo (of dilute densities ~ 0.05), as determined by eq. ( 111.3 ) at the reaction 
coordinates of the folded state ensemble. Folded states with this structure of a core containing nearly all the contacts, 
and dangling loops or ends, are not entirely inconsistent with what is known about real folded proteins, which consist 
of regions of sequence with well-defined spatial structure (e.g. the tertiary arrangement of helical segments) along 
with regions of somewhat greater entropy density with not as well-defined structure (e.g. the "turns" in a helical 
protein). However, one should still keep in mind the previous comment regarding the simplifying assumption of a 
non-interacting halo in the theory of collapse. 

Viewing the problem from a somewhat different angle, we can seek energetic parameters comparable to the lattice 
simulation values which give a double well free energy surface in the coordinates Q and z, with a barrier position and 
height consistent with simulations and experiments. 

The result of this is shown in figure ^B, which shows the free energy surface at T F obtained from the parameters 
(e,e, Se n ,T F ) = (0.9, —2.8, —1.6, 1.51). The gap to roughness ratio for this minimal model is \Ss n \/e = 1.8 (satisfying 
the conditions for global foldability) . The system has a double well structure with a weakly first-order transition 
between a collapsed globule, at (QmgJmg) — (0.07,0.97), and a somewhat expanded (w 3 or 4 less contacts for the 
27-mer) core-halo like folded state at (Q F ,~z F ) = (0.80, 0.83). For these energetic parameters, the folded state is more 
energetically favored with Ef — E mg = —3.6k B T and thus less entropic (T F (Sf — S mg ) — —3.6k B T). For the folded 
states, the density of the polymer is inhomogeneous, with a core containing about 23 monomers at density r\ c = 1 
and a halo of about 4 monomers at density rj H effectively zero. A more exact theory would impose the constraint of 
chain connectivity on the halo, which would significanty increase its density, and decrease the expansion effect seen 
here (Az = 0.14). This expansion effect is naturally reduced as the average homopolymer attraction energy becomes 
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larger, and is also reduced for rougher landscapes, where non-native contacts with large variance of interaction energy 
can contribute to deeper minima. In the folded enemble, all the entropy is essentially in the dilute, expanded halo, 
and all the energy in the dense core. 

The core residues in the transition state ensemble at (Q*,~z*) = (0.5,0.8) contain approximately Nz N Q* / z NQ = 
16 monomers. Due to its position, the transition state ensemble has almost twice the thermal entropy as in the 
simulations, so that it consists of ~ 10 7 thermally occupied states and ~ 10 s configurational states. The free e nergy 
barri er at this position is about AF = 6k B T of which the energetic and entropic contributions, from equations (LL4) 
and ( [lI.5|) respectively, are = lk B T and 5fc B T respectively, the entropy loss to condense the critical core being the 
more dominant factor here. The barrier height is naturally reduced for smaller homopolymeric biases, and also for 
rougher landscapes, where traversing the rugged landscape to find the folded state becomes more of a second order, 
less collective process. 

The core-halo expansion effect is exaggerated for the parameters used in simulating a lattice model 27-mer. One 
explanation is that nativeness instills collapse not just in the subunits that have native contacts, but also in the 
surrounding polymer medium because of topological constraints not considered in the simple non-interacting core- 
halo model. We should also bear in mind that at high Q for the 27-mer there are significant lattice effects in the 
simulation. The non-self averaging behavior here cannot be predicted by the simple polymer model. 



B. Explicit 3-body effects 

It is interesting to investigate the effects of explicit many-body cooperativity on the folding funnel by introducing a 3- 
body interaction in addition to the pair interactions already present. Models with such partially explicit cooperativity 
mimic the idea that only formed secondary structure units can couple, and have been introduced in lattice models 
by Kolinski et al. jjj. 3-body interactions enter into the energetic contributions of (IV.l) as an additional Q 2 term in 



the bias and roughness, and z 2 term in the homopolymer attraction, so that those terms in the free energy become 

[(1 -a)z + az 2 ] \e\ - [(1 - a) Q + aQ 2 ] z N \5e n 



2T 



{l- [(1 - a) Q + aQ 2 ] 2 } 



We can obtain the parameters characterizing the barrier as a function of the three-body coefficient a. The collectivity 
induced by 3-body interactions makes the energetic funnel steeper and narrower, and the gap bias is then effective 
only for higher Q. This means that to maintain equilibrium between the globule and folded states the landscape must 
either be less rough or more strongly biased. We choose to increase the stability gap |<5e n | at constant roughness e 
and temperature T F . The magnitude of the gap energy is a roughly linearly increasing function of the coefficient of 
the three body term a, rising from 1.6 at a = to = 2.1 at a = 1, in units where T F = 1.51. With this correction 
included, we find the barrier position Q* (a) to be only weakly dependent on a (although as described above, Q* 
is not independent of m, the order of the m-body interactions), and the position of the folded state Q F to weakly 
increase (figure |]A). As expected, the transition becomes more first order- like with increasing a (the free energy 
barrier increases), but there is a non-trivial dependence of the entropic and energetic contributions to the barrier (see 
Fig. §B). 



C. Dependence of the barrier on sequence length 

It is simple in our theory to vary t he polymer sequence length. One recalculates s(Q) at constant density 
and inserts this, along with eq. ( III.3| ) at the larger value of N, into the free energy (IV.l). Then one must rescale 



the temperature since in our model larger proteins fold at higher temperatures, i.e. By equation (jT9|) the folding 
temperature should scale with N somewhat greater than as z N (see figure ||A). Figure ||B shows that the resulting 
free energy has a barrier whose position is a mildly decreasing function of N. An explanation for this is that in larger 
polymers, entropy loss due to topological constraints is more dramatic in Q because a smaller fraction of total native 
contacts is necessary to constrain the polymer. That is, as N increases, the number of bonds per monomer in the fully 
constrained state (Q = 1) approaches the bulk limit of 2 (see eq. III.2| ]), while only one bond is needed to constrain 



a monomer. So this pushes the position Q* of the barrier in, roughly as l/z N . Plotted along with the theoretical 
curve are three experimental measurements of the barrier position. The square represents the measurement for A- 
repressor Jl7| , a ~ 70 residue protein with largely helical structure. The triangle represents Chymotrypsin Inhibitor 2 
(CI2) fTq ], a 64 residue protein with both a-helices and /3-sheets. The correspoding states analysis || shows that the 
formation of secondary structure within these proteins makes them entropically analogous to the lattice 27-mer. Also 
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plotted in the figure (circle) is the experimental barrier measurement for Cytochrome C pq), a 104 residue helical 
protein which is entropically similar to the 64-mer lattice model. The simple proposed model is in reasonable accord 
with the general quantitative trend in the position of the transition state ensemble that is observed experimentally. 
In appendix A, we show for thoroughness that experimental plots of folding rate vs. equilibrium constant are indeed 
a measure of the position of the transition state ensemble. 

Figure |6|C shows the general increasing trend of the barrier height as N increases, as well as the entropic and 
energetic contributions to the barrier. The increasing entropy and decreasing energy of the barrier indicate a more 
significant expansion with iV at the transition state coordinates (Q*,z*). In this model larger proteins expand more 
to rearrange the backbone to the folded three-dimensional structure. 



D. Dependence of the barrier on the stability gap, at fixed temperature and roughness. 

As the stability gap is increased at fixed temperature, folding approaches a downhill process, with the folded 
ensemble becoming the global equilibrium state (see Fig. |?]A). We can see from figure |?|A that the barrier position and 
height are decreasing functions of stability gap, with true downhill folding (zero barrier) occuring when Se n /e = 2.4 
or 5e n /T F s = 1.6 for the 27-mer (see Fig.s f?]B and ^C). At T F , Se n /T F s = 1.2. Thus, achieving downhill folding 
requires a considerable change of stability - an estimate for a 60-mer protein would be an excess stability of ~ 8fc B T F . 

We can apply the equations of Appendix A to changes of the transition state free energy by modifying stability. 
Fig. ||A shows a plot of the log of a normalized folding rate In k vs. the log of the unfolding equilibrium constant 
h\K eq , whose slope is a measure of the barrier position Q*. The increasing magnitude of slope with increasing lnK eq 
means that the barrier position is shifting towards the native state as the gap decreas es. F igure |§B shows the actual 



position of the barrier, along with Q* as derived from the slope of figure HA from eq. (A.4) 



E. Denaturation with increasing temperature. 



The probability P u for the protein to be in the unfolded globule state at temperature T is 

-l 



Pu = 



l + e -*(f/-f.) 



where F u and Ff are the free energies at temperature T of the unfolded and folded minima (note that at T P , Ft = F u 
and P u = 1/2). This is used to obtain denaturation curves. For illustration, we make the simplifying assumptions 
that both the folded and globule states are collapsed, making P u independent of e, and that the folded and globule 
states occur approximately at Q F = 1 and Q MG — 0. As the temperature is lowered, the molten globule freezes into 

a low energy configuration at T g — e^J z N (l — Q^g) I (2s (Q mg )) — £\/ Zn / (2s Q ) (see Fig. ||), and the expression for 



P u becomes one of equilibrium between two temperature independent states with the corresponding "Shottky" form 
of the energy and specific heat: 

'|fen| 

2T 2 . 

-l 



Pu 



1 + exp (— Ns a ) expNz 



1 + exp ■ 



T 



| (fen | - £\ 




T a <T 



T<T a 



(IV.3) 



The condition that T F /T G > 1 gives eq. flH.lC| ) 



Using the glass temperature of the globule state, this is equivalent to 

-2 



T > 
± a — 



e 

fen 



which is the temperature where the high T expression for P u has a minimum. Hence cold-denaturation will not be 
seen in the constant density model (as it would if there were no glass transition), and P u will always decrease to zero 
at low temperatures. 

In the limit of large T, ( 1V.3| ) becomes w 1/ (1 + exp —Ns ) sa 1, indic ating denaturation. At small T ( IV. 3 ) tends 
to zero as exp — (const, x N/T). The denaturation curve from equation ( IV.3| ) is plotted in figure [| for two proteins 



of different roughness. The ratios of widths to folding temperatures AT/T F are about 0.2 and 0.4 for the parameters 
used in the figure. These are consistent with the simulation values. J2G| 

The expanded halo of the native states modifies the denaturation behavior. The increased entropy of the folded 
state with a shrunken core leads to a partially re-entrant folding transition, which is quite weak. We believe this to 
be an artifact of the over-estimated entropy of the halo. 
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V. CONCLUSION 



In this paper we have shown that if the energy of a given configuration of a random heteropolymer is known to be 
lower than expected for the ground state of a completely random sequence (i.e. the protein is minimally frustrated), 
then correlations in the energies of similar configurations lead to a funnelled landscape topography. The interplay of 
entropic loss and energetic loss as the system approaches the native state results in a free energy surface with two- 
state behavior between an unfolded globule of large entropy, and a folded ensemble of lower energy but non-negligible 
entropy. The weakly first-order transition is characterized by a free energy barrier which functions as a "bottle-neck" 
in the folding process. 

The barrier is small compared with the total thermal energy of the system - on the order of a few k B T for smaller 
proteins of sequence length about 60 monomers. For these small proteins the model predicts a position of the barrier 
Q* about half-way between the unfolded and native states (Q* = 1/2). For larger proteins, the barrier in Q (the 
fraction of native contacts) moves further away from the native state towards the molten globule ensemble roughly 
as l/z N , due essentially to the fact that the entropy decrease per contact is indepent of N initially. Experimental 
measurements of the barrier for fast folding proteins are consistent with this predicted shift in position with increasing 
sequence length. 

The folded, unfolded and transition states are not single configurations but ensembles of many configurations. The 
transition state ensemble according to this theory consists of about 10 7 thermally accessible states for a small protein 
such as A-repressor, roughly the combinatorial number of 16-mer core residues in the transition state ensemble of 
the 27-mer minimal model. This observation is in harmony with the picture of a generally de-localized ensemble of 
transition state nuclei in contact sequence space, a subject investigated recently by various authors |29|]. 

A simple theory of collapse was introduced to couple protein density with nativeness. This resulted in an expansion 
due to dangling residues rather than contraction in density in the process of folding. The expansion is overestimated 
due to the neglect of some of the effects of chain-connectivity in the present halo model, and a poor description of 
the lattice-dependent high Q protein topologies. During folding, a dense inner native core forms, which grows while 
possibly interchanging some native contacts with others upon completion of folding. This core is surrounded by a 
halo of non-native polymer which expands in the folding process. The folded free energy minimum is only about 80% 
native in the model when parameters are chosen to fit simulated free energy curves for the 27-mer. 

Explicitly Cooperative interactions were shown to enhance the first-order nature of the transition through an increase 
in the size of the barrier, and a shift towards more native-like transition state ensembles (i.e. at higher Q*). For the 
constant density scenario the barrier becomes almost entirely entropic when the order m of the m-body interactions 
becomes large, and the transition state ensemble becomes correspondingly more native-like. In the energy landscape 
picture, as explicit cooperativity increases, the protein folding funnel disappears, and the landscape tends towards a 
golf-course topography with energetic correlations less effective and more short range in Q space. The correlation of 
stability gaps and T F /T G ratios with kinetic foldability is true only for fixed m much less than N. 

A full treatment of the barrier as a function of the 3 energetic parameters (e,e,de n ) plus temperature T would 
involve the analysis of a multi-dimensional surface defining folding equilibrium in the space of these parameters. We 
shall return to this issue in the future, but we have deferred it for now in favor of the simpler analysis of seeking 
trends in the position and height of the barrier as a function of individual parameters such as Se n and T. 
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A. APPENDIX A 



For small, Q dependent changes in the free energy ( lV.l[ ), e.g. changes in temperature, we can approximate the 
change in the free energy at the position of the barrier SF (Q*) as a linear interpolation between the free energy 
changes of the unfolded and folded minima SF U and SFf, here estimated to be at Q u = and Qf = 1 respectively 



SF (Q*) = Q*SF f + (1 - Q*) SF U 



(A.I) 



Furthermore let us approximate the kinetic folding time by the thermodynamic folding time 29 
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where Q* is the position of the barrier and t (Q*) is the lifetime of the microstates of the transition enemble. Then the 
log of the folding rate k is oc F u — F (Q*). The equilibrium constant for the unfolding transition K eq is the probability 
to be in the unfolded minimum over the probability to be in the fold ed minimum, and so lnK eq oc Ff — F u . If we plot 
lnfc vs. lnK eq , the assumption of a linear free energy relation (AT) and a stable barrier position results in a linear 
dependence of rate upon equilibrium constant, with slope 

S[F U -F(Q*)] _ SF U -SF(Q*) 
6{F f -F u ] ~ SF f -SF u ~ W ^ J 

so that experimental slopes of folding rates vs. unfolding equilibrium constants are indeed a measure of the position 
of the barrier in our theory. 



If the unfolded and folded states are not assumed to be at Q = and Q = 1 respectively, equations (A.l) and (A. 2) 
are modified by 

SF (Q*) - ( SF f + f g^H SF U . (A.3) 

and 



4 Qf Qu 

S[F U ~F(Q*)} (Q*-Q v 



S[F f -F u ] VQf-Qu 



(A.4) 



where Q v and Q F are the respective positions of the unfolded and folded states. So one can obtain the barrier position 
from the slope of a plot of lnfc vs. lnif eg , given the positions of the unfolded and folded states. 
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FIGURE CAPTIONS 



Figure 0. (A) Free energy per monomer F/N for a 27-mer, in units of k B T F as a function of Q, at constant density 77 = 1, 
with s = 0.8, for protein-like energetic parameters (Se n ,e) — (—2.28, 1.55). For these parameters T F « |fe n |- 
For illustrative purposes, two values of m-body interactions are chosen: (solid line) Pure 3-body interactions, 
(dashed line) Pure 12-body interactions. Note the trends in height and position of the barrier, and note how 
in the m — 12 case the free energy curve is essentially — T times the entropy curve s(Q) until Q is large. 

(B) Position of the transition state ensemble Q* along the reaction coordinate Q as a function of the explicit 
cooperativity in pure m-body forces, to. The fact that the asymptotic limit Q m ax is less than one is due to the 
finite size of the system, so that Q, the fraction of native contacts, is not a continuous parameter. 

(C) The free energy barrier height AF in units of k B T F as a function of the explicit cooperativity of the m-body 
force, to. The barrier height rises to the limit of S(Q — 0) as m — > 00, when it becomes completely entropic. 
Also shown are the energetic (dashed) and entropic (solid) contributions to the barrier. 

Figure |^. A model of the partially native protein can be pictured as a frozen native core surrounded by a halo of non-native 
polymer of variable density. 

Figure |^. The folding temperature T F and glass transition temperature T G as a function of the fraction of native contacts 
Q and the total contacts per monomer z. The folding temperature is above the glass temperature (LLC) for 
essentially all values of Q and z, for protein-like energetic parameters. Here e « 1 and 5e D — —1.6. 

Figure ^. (A) The free energy vs. Q and z, at the folding temperature T F , from simulations 



(B) Free energy surface at T F for the 27-mer, obtained from eq. ( IV.l ) with the parameters (s,s,5s n ,Tj 



(0.9, —2.8, —1.6, 1.51). The surface has a double well structure with a transition state ensemble at Q* = 0.5. 

Figure |5|. (A) Position of the barrier Q* and position of the folded free energy minimum Q F as a function of the three 
body coefficient a. 

(B) Free energy barrier AF = F(Q*,~z*) — F(Q MG ,~z MG ) in units of k B T F , and its energetic and entropic 
contributions, for the 27-mer, as a function of a. There are two values of a where the barrier is completely 
entropic, which define a region where the transition ensemble has a lower average energy than the molten globule. 

Figure ^. (A) The folding temperature T F is an increasing function of polymer sequence length N. 

(B) Po sition of the barrier Q* as a function of sequence length N. The solid line is the theory as determined 



by eq. ( 1V.1 ), and the points marked by polygons are experimental results (see text). 
(C) Free energy barrier height AF in units of k B T F , as a function of sequence length N, along with its energetic 
and entropic contributions. 

Figure^. (A) Two plots of the free energy vs. Q for the 27-mer with e = 0.9 and T = 1.5, at fixed density r\ = 0.95. 

The upper curve is the free energy when the stability gap Se n = 1.75, and fe n = 2.1 in the lower curve. From 
the figure we can see that as Se n increases and the folding becomes downhill, the barrier shifts to lower Q and 
decreases in height. 

(B) The positions of the barrier Q* and folded state Q F as a function of energy gap Se n , for the system described 
in (A), where T = 1.5 and e = 0.9. 

(C) Free energy barrier in units of k B T vs. stability gap Se n . The short dashed line is the entropic contribution 
to the barrier, and the long dashed line is minus the energetic contribution. 

Figure @. (A) plot of the logarithm of the folding rate vs. the logarithm of the unfolding equilibrium constant. 

(B) Reading the slope from a (A) gives a measure of the position of the barrier Q*. Also plotted is the actual 
value of Q* directly calculated from the free energy curves. The values compare well for most values of the gap 
where the free energy has a double- well structure. 



Figure [|. (solid line) Probability to be in the unfolded state (eq. [ CV.3|] ) as a function of temperature for the 27-mer 
with energetic parameters (e, Se n ,T F ) = (0.9,-1.8,1.51). 

(dashed line) Same probability for a protein with a rougher energy landscape, which has a lower folding 
temperature and somewhat broader denaturation curve, (e, Ss n , T F ) — (1.3, —2.0, 1.3). 
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FIG. 2. 
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FIG. 3. 
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